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1. Introduction 

| In the study of genome rearrangement, one often views genomes as signed 

permutations, where each integer corresponds to a unique gene/marker and 
the sign corresponds to its orientation. For unichromosomal genomes, the 
most frequent rearrangements are reversals. A reversal p(i,j) applied to a 
permutation n = n\ - ■ ■ 7Tj_i7Tj • • • iTjiTj+i • • • vr n reverses the segment 7Tj • • • ttj 
to obtain a new permutation tti • • • vrj_i — 7Tj — ftj-i ■ ■ ■ — iTiTTj + i • • • ir n . For 
instance the reversal p(2, 4) would send 41 — 5 — 23 to 425 — 1 3. In 
this context, the minimum parsimony distance of a signed permutation n 
is defined as the minimum number of reversals needed to bring the identity 
permutation to tt. Hannenhalli and Pevzner [HaPej found an exact combina- 
torial formula for the minimum parsimony distance of a signed permutation; 
for this and other results, see the book [F]. 

A fundamental problem, emphasized in BoPe? and the survey |Duj . is to 
understand the distribution of the minimum parsimony distance after a given 
number of reversals has occurred. More generally for any shuffling technique 
on permutations (which from now on we assume are unsigned so that we 
are dealing with the symmetric group S n ), one can define the minimum 
parsimony distance of a permutation tt as the number of shuffles needed to 

l 



2 



bring the identity to ir. And it is natural to study the distribution of the 
minimum parsimony distance after a given number of shuffles has occurred. 

An exciting recent work along this lines is the paper [BeDuJ , which studied 
the random transposition walk on the symmetric group on n symbols. Then 
the minimum parsimony distance of tt is simply n — number of cycles of 
7r. Letting Dt be the minimum parsimony distance after t iterations of 
the random transposition walk, they showed that D cn j 2 ~ u(c)n, where 
u is an explicit function satisfying u(c) = c/2 for c < 1 and u(c) < c/2 
for c > 1. They also described the fluctuation of D cn j 2 about its mean 
in each of three regimes (subcritical where the fluctuations are Poisson, 
critical, and supercritical where the fluctuations are normal). They exploit 
a connection between the transposition walk and random graphs (about 
which an enormous amount is known). 

In the current paper we examine minimum parsimony distance for a more 
vigorous shuffling method, the Gilbert-Shannon-Reeds riffle shuffle. While 
we do not know if this is of biological interest, the mathematical ubiquity of 
riffle shuffles (see the survey |Dij for an overview of connections to dynamical 
systems, Lie theory and much else) as well as possible applications in casinos 
more than justifies the question. Riffle shuffling proceeds as follows. Given 
a deck of n cards, one cuts it into 2 packets with probability of respective 

(") 

pile sizes j, n — j given by Then cards are dropped from the packets 
with probability proportional to the packet size at a given time; thus if 
the current packet sizes are A±,A 2 , the next card is dropped from packet 
i with probability Ai/(A\ + A 2 ). Bayer and Diaconis |BayDi| prove the 
fundamental result that after r riffle shuffles, the probability of obtaining the 

/n + 2 r -d(ir)-l\ 

permutation 7r _1 is ^ • Here d(7r) denotes the number of descents 

of 7r, that is \{i : 1 < i < n — l,7r(i) > n(i + 1)}|. For instance the 
permutation 3 1 4 2 5 has two descents. From the Bayer-Diaconis formula 
it is clear that the minimum parsimony distance of a permutation 7r _1 is 
simply \log 2 {d{Ti) + 1)]. Thus the study of minimum parsimony distance 
for riffle shuffles is the study of the distribution of d{ir) under the measure 

^n+2 r -d(ir)-l^ 
2 rn • 

More generally, for k and n integers, we let i?fc >n denote the measure on 

/n + k-d(Tr)-\\ 

S n which chooses tt with probability ^ and study the number of 

descents. First let us review what is known. As k — > oo, the distribution 
Rk, n tends to the uniform distribution on S n . It is well known f |CKSSj ). jTj) 
that for n > 2 the number of descents has mean and variance ^fc^ and 
that rf ^~^~^ 2 is asymptotically normal. Aldous proved that ^log 2 (n) 

riffle shuffles are necessary and suffice to be close to the uniform distribution 
on S n . Bayer and Diaconis BayDi give more refined asymptotics, proving 
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that for k = 2 c n 3 / 2 with c a real number, 

\ y: \Ra*) - = 1 - 2 *(^) + o(«- i/4 ) 



where 

$(V) = - 

/27T 

Motivated by this result, Mann |Maj proved that if k = am?/ 2 , with a fixed, 
then the number of descents has mean — + O(l) and variance 
y| + 0(n 1 / 2 ), and is asymptotically normal as n — > oo. He deduces this 
from Tanny's local limit theorem for rf(7r) under the uniform distribution 
[T] and from the formula for Rk, n - 

We prove two new results concerning the distribution of d(7r) under the 
measure Rk, n - First, we complement the above results on normal approxi- 
mation by using Stein's method to upper bound the total variation distance 
between the distribution of k — 1 — d(ir) and a Poisson variable with mean 
^h-; our bound shows the approximation to be good when ^ is small. Sec- 
ond, we use generating functions to give very precise asymptotic estimates 
for the mean and variance of d(ir) when ^ > For instance if k = an 
with a > tt-, we show that 

. / 1 \ ( e 1 / a (-2ae 1 / a + 2a + e 1/a + 1)~ 
fe,n_1 {d + l) ~ ~ e^-J ) ~ { 2^^-1)3 

is at most anc i that 

(e 1 l a {a 2 e 2 ' a + a 2 - 2a 2 e 1 / a - e l ' a ) 
Var Rkin _M) ~ n _ 



< A 



a 



where C a ,A a are constants depending on a (and are independent of a for 
a > 1). Mann |Maj had exact expressions for the mean and variance (which 
we derive another way) but only obtained asymptotics when k = av?l 2 
with a fixed. The main technical point and effort of the paper L SG2j on 
information loss in card shuffling was to obtain asymptotics for the mean 
and variance of d(7r) under the measure Rk, n - By a clever application of 
the method of indicator variables, they obtain bounds, but ours are much 
better. 

Next let us describe the technique we use to study the distribution of d(ir) 
under Rk,n, as we believe this to be as interesting as the result itself. To 
apply Stein's method to study a statistic W, one often uses an exchange- 
able pair (W, W 1 ) of random variables (this means that the distribution of 
(W, W') is the same as that of (VF',PF)) such that the conditional expec- 
tation E(W'|W) is approximately (1 — X)W. Typically to construct such a 
pair one would use a Markov chain on S n which is reversible with respect 
to the measure Rkn, choose ir from Rkn, let ir' be obtained from tt from 
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one step in the chain, and finally set (W, W) = {W{tt),W{tt')). For the 
problem at hand this does not seem easy. Thus we modify the problem; 
instead of considering the measure Rk, n , we consider a measure Ck, n which 

in+k— c(7r) — 1\ 

chooses a permutation tt with probability - — %n-i — • Here c(ir) is the num- 
ber of cyclic descents of ir, defined as d(ir) if 7r(n) < 7r(l) and as d(7r) + 1 if 
7r(n) > vr(l). This probability measure was introduced in |Fulj and CV^vr) 
gives the chance of obtaining 7r _1 after first cutting the deck at a uniformly 
chosen random position and then performing r iterations of a riffle shuf- 
fle. The advantage of working with Ck,n is that it has a natural symmetry 
which leaves it invariant, since performing two consecutive cuts at random 
positions is the same as performing a single cut. As will be explained in 
Section [2J this symmetry leads to an exchangeable pair (d,d') with the very 
convenient property that Ec(<f |-7r) is approximately (1 — ~)d. We obtain 
a Poisson approximation theorem for d under the measure Ck, n - Although 
the measures Rk, n an d Cfc,n are n °t close when ^ is small (a main result of 
|Fu3j is that that the total variation distance between them is roughly || 
for k > n), we show that the distribution of k — d under Ck, n is close to the 
distribution of k — c under C^^ n which in turn is equal to the distribution of 
k — 1 — d under Rk, n -i- This implies a Poisson approximation theorem for 
the original problem of interest. 

Incidentally, it is proved in Ful that r iterations of "cut and then riffle 
shuffle" yields exactly the same distribution as performing a single cut and 
then iterating r riffle shuffles. Thus the chance of 7r _1 after r iterations 
of "cut and then riffle shuffle" is CV n(7r), which implies that the minimum 
parsimony distance of the "cut and then riffle shuffle" process is \log2(c(ir))~\ . 
Thus the study of minimum parsimony distance for the "cut and then riffle 
shuffle" procedure is equivalent to the study of c under the distribution 
Ck, n - But as mentioned in the last paragraph, we will prove that this is 
the same as the distribution of d + 1 under Rk, n -i- Hence the theory of 
minimum parsimony distance for "cut and then riffle shuffle" is equivalent 
to the theory for riffle shuffles, and we shall say nothing more about it. 

The reader may wonder why we don't apply our exchangeable pair for 
normal approximation. Most theorems for Stein's method for normal ap- 
proximation assume that the pair (W, W) satisfies the property E(VF'|VF) = 
(1 — X)W for some A. In our case this only approximately holds, that is 
E(W'\W) = (1 - \)W + G{W) where G(W) is small. There are normal 
approximation results in the literature ( RH], \Ch\ ) for dealing with this 
situation, but they require that M*Zj2QD goes to 0. Using interesting prop- 
erties of Eulerian numbers, we show that even for the uniform distribution 
(the k — > oo limit of Ck t n) the quantity ¥, ^ G ^ V ^ [ s bounded away from 0. 
Finding a version of Stein's method which allows normal approximation for 
our exchangeable pair (even for the uniform distribution) is an important 
open problem. Incidentally, for the special case of the uniform distribution, 
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it is possible to prove a central limit theorem for d by Stein's method |Fu2j . 
using a different exchangeable pair. 

Having described the main motivations and ideas of the paper, we describe 
its organization. Section [2] defines an exchangeable pair to be used in the 
study of d(ir) under the measure Ck, n , and develops a number of properties 
of it. It also gives closed formulas (but not asymptotics) for the mean 
and variance of d(n), by relating them to the mean and variance of c(ir), 
and computing the latter using generating functions. Section uses the 
exchangeable pair of Section |2] to prove a Poisson approximation theorem 
for k — d(ir) under the measure Ck n (the bounds are valid for all integer 
values of k and n but informative only when - is small). It then shows how 
to deduce from this a Poisson approximation theorem for k — 1 — d(ir) under 
the measure R^ n . Section |I] gives asymptotics for the mean and variance for 
c(ir) under Cfc >n for ^ > ^- (and so also for d(ir) under Cfc >n and Rk, n )- It 
then explores further properties of the exchangeable pair which are related 
to normal approximation. Finally, it gives a quick algorithm for sampling 
from Rk, n , which should be useful in empirically investigating the nature of 
the transition from Poisson to normal behavior. 

2. The Exchangeable pair, mean, and variance 

This section constructs an exchangeable pair (d, d') for the measure Ck, n 
and develops some of its properties. Throughout we let Ec denote expecta- 
tion with respect to Ck >n - We relate Ec(eZ) and Kc(d 2 ) to Ec(c) and Ec(c 2 ), 
and then use generating functions to find expressions (whose asymptotics 
will be studied later) for Ec(c) and Ec(c 2 ). 

To begin let us construct an exchangeable pair (d,d f ). We represent per- 
mutations 7r in two line form. Thus the permutation represented by 

i : 1 2 3 4 5 6 7 
7r(i) : 6 4 1 5 3 2 7 

sends 1 to 6, 2 to 4, and so on. One constructs a permutation n' by choosing 
uniformly at random one of the n cyclic shifts of the symbols in the bottow 
row of the two line form of it. For instance with probability 1/7 one obtains 
the permutation it' which is represented by 

i : 1 2 3 4 5 6 7 
7r'(f) : 5 3 2 7 6 4 1 ' 

An essential point is that if it is chosen from the measure Ck, n , then so is 
7r'; note that this would not be so for the measure Rk,n- Thus if one chooses 
7T from Ck, n , defines it' as above, and sets (d,d') = (d(ir), d(ir')), it follows 
that (d,d!) is exchangeable with respect to the measure Cj~ jn . Observe also 
that d! -de {0,±1}. 

Recall that n is said to have a cyclic descent at position j if either 1 < 
j < n — 1 and 7r(j) > ir(j + 1) or j = n and ir(n) > vr(l). It is helpful to 
define random variables Xji^) (1 < j < n ) where Xji^) = 1 if vr has a cyclic 



descent at position j and Xjl 71 ") = if 7r does not have a cyclic descent at 
position j. We let I denote the indicator function of an event. We also use 
the standard notion that if Y is a random variable, E(Y|y4) is the conditional 
expectation of Y given A. 

Lemma 2.1. 

E c (d'-d|7r) = -~ + ^I Xn(7r)=1 . 

Proof. Note that d' = d + 1 occurs only if tt has a cyclic descent at n and 
that then it occurs with probability n ~ l ~ d . Note also that d' = d — 1 occurs 
only if tt does not have a cyclic descent at n, and that it then occurs with 
probability ^. To summarize, 

. d n — 1 — d 
E c (d-d|vr) = --I XnW=0 + I XnW= i 

d n — 1 

□ 

As a corollary, we obtain Ec*(cf) in terms of Ec(c). 
Corollary 2.2. 

77- — 1 

E c (d) = E c (c). 

n 

Proof. Since (d,d') is an exchangeable pair, Ec(d' — d) = 0. It follows that 
E c (E c ((f - d|?r)) = 0. So from Lemma IP 

E c (d) = (n-l)E c (I XnW=1 ). 

Since the variables Xi(7r), • • • , Xn(^) have the same distribution under n , 
and c = XiX 71 ") + • • • + Xn{^), the result follows. □ 

Lemma 12.31 will be helpful at several points in this paper. 

Lemma 2.3. 

/ ' c(c- IV 

E c ( C ffl Xn(7r)=1 )=E c (^A_^ 

Proof. Observe that 

E c (dI Xn(7r)=1 ) = E c ((c-l)I XnW=1 ) 



1 n 

-^e c ((c-i)i XiW=1 ; 



n 
i=i 

iEc(c(c-l)). 
n 



□ 



As a consequence, we obtain Ec(<i 2 ) in terms of Ec(c) and Ec(c 2 ). 



Corollary 2.4. 



E c (d 2 ) = (1 - -)E c (c 2 ) + -Ec(c). 

n n 



Proo/. 

E c (c 2 ) = E c (d + I Xn(7r)=1 ) 2 

= E c (d 2 ) + 2E c (cffl Xn(7r)=1 ) + E c (l Xn(7r)=1 ) 

= E c (ci 2 ) + -E c (c 2 - c) + -Ec(c), 
n n 

where the final equality is Lemma 12.31 This is equivalent to the statement 
of the corollary. □ 

Next we use generating functions to compute Ec(c) and Ec*(c 2 ). For this 
some lemmas are useful. 

Lemma 2.5. C |Ful| ) For n > 1, the number of elements in S n with i cyclic 
descents is equal to n multiplied by the number of elements in S n -i with i — l 
descents. 

Lemma 2.6. 

f.c{ir) 



(l-t) n ^ 



m>0 



Proof. Given Lemma 12.51 the result now follows from the well known gen- 
erating function for descents (e.g. FoS ) 



^ eSn - = V m n t m . 

1 - ^ 

y 1 m>0 

□ 

Proposition 12.71 gives a closed formula for Ec(c). 
Proposition 2.7. For n > 1, 

Proof. Multiplying the equation of Lemma 12.61 bv (1 — t) n and then differ- 
entiating with respect to t, one obtains the equation is 

c(vr)t c W- 1 = n(l - t) n Y m n t m - x - ra 2 (l - t) n ~ l Y m n - l t m . 

■KdSn m>0 m>0 

Multiplying both sides by nk n-i^_ t ^n gives the equation 

E c(ir)t c ^ _ 1 m n t m n ST^ m n~l t m+l 

7re5„ V ; m>0 v ; m>0 
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The coefficient of t k on the left hand side is precisely the expected value of 
c under the measure Cfc n . The proposition now follows by computing the 
coefficient of t k on the right hand side. □ 

By a similar argument, one obtains an exact expression for Ec(c 2 ). 

Proposition 2.8. For n > 1, 

n(n + 1) ^-i , n n(nk — n — k) k 



^ , ; n-l 

jUn— 1 / / ^ 1 k n ~^ 

j I ' .; I 



Proof. Prom the proof of Proposition 12.71 we know that 

c(vr)t c W = n(l - i) n ^ m n i m - n 2 (l - t) n ~ l ^ m"" 1 ^ 1 . 

7rG5„ m>0 m>0 

Differentiate with respect to t, multiply both sides by wfcn _ 1 * 1 _^ n , and take 

the coefficient of t k . On the left hand side one gets Ec(c 2 ). On the right 
hand side one obtains the coefficient of t k in 

-\ Y m n+1 t m - y m n t m+1 

k n-l k n ~ l (l -t) ^ 

m>0 y ' m>0 

— £ — , y m n -H m+i + , n( 1 r ;~ 1) , 9 y m n -H m+2 . 

k n ~ x {\-t)^ k n - x {\ - t) 2 

After elementary simplifications the result follows. □ 

3. POISSON REGIME 

A main result of this section is a Stein's method proof that for k much 
smaller than n, the random variable X(tt) := k — (i(vr) under the measure 
Cfc n is approximately Poisson with mean A := -. Then we show how this 
can be used to deduce Poisson limits for k — c(ir) under the measure Cfc jn 
and for k — 1 — d(ir) under the measure Rk, n - 

To begin we recall Stein's method for Poisson approximation. A book 
length treatment of Stein's method for Poisson approximation is [BarH.T . 
but that book emphasizes the coupling approach. We prefer to work from 
first principles along the lines of Stein's original formulation as presented in 

m- 

Throughout we use the exchangeable pair (X, X'), where it and tt' are as 
in Section 12 X(tt) = k — d(ir), X' = X(tt'), and the underlying probability 
measure is Ck, n - Let Pa denote probability under the Poisson distribution of 
mean A, and as usual let Pc denote probability with respect to the measure 

n . Let A be subset of Z + , the set of non-negative integers. Stein's method 
is based on the following "Stein's equation" 

P C (X e A) - ¥ X {A} = E c (iT x - Ta)g x ,A- 

Let us specify the terms on the right hand side of the equation, in the 
special case of interest to us. 
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(1) The function g = g\ t A ■ ^ + i— >• R is constructed to solve the equation 

Xg(j + 1) - jg(j) = KjzA - w x {A}, j > o 

where g(0) is taken to be 0. We also need the following lemma which 
bounds certain quantities related to g. 

Lemma 3.1. C |BarHJj . Lemma 1.1.1) 

(a) Let \ \g\\ denote supj\g\^{j)\ . Then \ \g\\ < 1 for all A. 

(b) Let A(g) denote supj\g\ )A {j + 1) - 9x,aU)\- Then A (s0 < 1 for 
all A. 

(2) The map T\ sends real valued functions on Z + to real valued func- 
tions on Z+ and is defined by T A (/)[j] = \f(j + 1) - jf(j). 

(3) The map i sends real valued functions on Z + to real valued functions 
on S n , the symmetric group. It is defined by (if)[ir] = f(X(Tr)). 

(4) The map T is a map from the set of real valued antisymmetric func- 
tions on S n x S n to the set of real valued functions on S n . It is defined 
by Tf[ir] = Ec(/(7r, vr')|7r). (Since the pair (vr,7r') is exchangeable, 
Ec(T/) = 0, which is crucial for the proof of the Stein equation). 

(5) Finally (and this is where one has to make a careful choice), the map 
a is a map from real valued functions on Z + to antisymmetric real 
valued functions on S n x S n . In the proof of Theorem 13.31 we will 
specify which a we use. 

In order to approximate X by a Poisson(A) random variable, it will be 
useful to approximate the mean of the random variable by A. This is 
accomplished in the next lemma, the second part of which is not needed in 
the sequel. 



Lemma 3.2. Let A 

(1) 



where k, n are positive integers. 
Ecr(c) 



A 



n 



E c (c) 



A 



n 



a. 

n 



Proof. For the first assertion, note that by Proposition 12.71 

fc-i 



Ec(c) 



A 



k 



< 



{k-l) n 

k n - x 



The second assertion follows since the formula for C^^ n forces c < k with 
probability 1. □ 



Theorem 3.3. Let A = 

A C Z+, 

\P c (k - d(vr) G A) 



where k, n are positive integers. Then for any 



F x (A)\<(t) 2 + k(n+l)(l-±-T. 
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Proof. As above, let X(tt) = k — d(ir) and X'(ir) = k — d(vr'). Since A, A are 
fixed, throughout the proof the function g\ t A is denoted by g. We specify 
the map a to be used in the "Stein equation" 

P C (X & A) — F X {A} = E c (iT x - Ta)g. 

Given a real valued function / on Z + , we define af by 

af[7Tl,TT 2 ] = f (X (ir 2 ))Ix(7T 2 )=X(7T 1 ) + l ~ /(^( 7r l))Ix(7r 1 )=X(7r 2 ) + l- 

Note that as required this is an antisymmetric function on S n x S n . 
Then one computes that Tag is the function on S n defined by 

Tag(ir) = E c (ag(<7r, n')\n) = E c (g(X')I x , =x+1 - g(X)I x=xl+1 \ir) . 

Thus by the reasoning of Lemma 12.11 

Tagiir) = g(X + l)I Xn(7r)=0 ^M _ g ( X )I Xn{7T)=1 (l " ^ ' 



n 



Since iTxg(ir) 
concludes that 



Xg(X + 1) - Xg(X) and I Xn(7r)= 



; „( W )=1, one 



(iT\g - Tag) [vr] 



(A 



c vr 



)g{x + 1; 



+ [(l XnM=1 -X)g(X)] 



c(tt) 
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l XnM=1 (g(X + l)-g(X)) 



(A 



c vr 



)a{x + 1; 



+ [(c(vr) - k)g(X)j 



+ 



c vr 



-I 



n 



Xn(7T) : 



- A (g(X + l)-g(X)) 



Thus to complete the proof, for each of the three terms in square brackets, 
we bound the expectation under the measure C/% jn . Lemma 13.11 and part 1 
of Lemma 13.21 give that 



E c g(X + 1)(A 



c vr 



n 



< | M | Ec (|A-^|) 



< E C (|A 

< fc(l-- 



c(vr) 



n 



For the second term in square brackets, one argues as for the first term in 
square brackets to get an upper bound of nk(l — \) n - 
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To bound the expectation of the third term in square brackets, note that 
the nonnegativity of c(7r)I Xn ( 7r - )=1 and Lemma 13 . 1 1 imply that 



E, 



c 



C (IT 



n 



Un)=1 (g(X + l)-g(X)) 



< A( 5 )E c (^I XnW=1 ) 

< Ec(^I x „ W =i) 



n 
n z 



By the explicit formula for Cfc jn , it follows that c(ir) < k with probability 1. 
Hence this is at most (^) 2 - □ 

To conclude this section, we show how Theorem ESI can be used to deduce 
Poisson approximations for the two statistics we really care about: k — c(tt) 
under the measure Ck, n and k — 1 — d(ir) under the measure Rk, n - 

Proposition 3.4. For all A C Z + , 

2k 

\¥ c (k - d(if) 6 A) - F c {k - c(vr) G A)\ < — . 

n 

Proof. Observe that for any I > 0, 

¥ c (d = I) 
= F c (d' = l) 

= F c {d' = l,c = l)+F c (d' = l,c = l + l) 

= F c (c = l)¥ c (d' = l\c = I) + F c (c = I + l)F c (d' = l\c = l + l) 

= F c (c = l)—+F c (c = l + l) l -±±. 

n n 

Thus 



F c (d = I) - P c (c = /) = --P c (c = + — Pc(c = I + 1) 

n n 



which implies that 



} c {d = I) - P c (c = 01 < -Vc(c = l) + — P c (c = 1 + 1). 

n n 
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Summing over I > gives that 



V|P c (d = Z)-P £7 (c = 0l < y i (- F o(c = l) + —¥c(c = l + l)) 



l>0 l>0 

= lY.Wc{c = l) 

l>0 

= -E c (c) 
n 

2k 

< — , 

n 

where the final inequality is Proposition 12.71 (or also since c < k with prob- 
ability 1). The result follows. □ 

Corollary 3.5. Let A = - where k,n are positive integers. Then for any 
A C Z+, 

k Ik 1 

| Po (fc - c (vr) G A) - P A (A)| < (-) 2 + - + k(n + 1)(1 - ~) n 

n n k 

Proof. This is immediate from Theorem 13.31 and Proposition 13.41 □ 

Proposition 13. 61 shows that the distribution of d(ir) + 1 under the measure 
Rk, n is exactly the same as the distribution of c(ir) under the measure Cj. n+ \. 

Proposition 3.6. For any r > 0, 

VR k Jd(ir) =r)= Pc fc ,„ +1 (c(vr) = r + 1). 
Proof. Note from the formula for Rk, n that for any r, the probability of r 

fn+h— ) — 1\ 

descents under the measure Rk, n is — §k — multiplied by the number of 
permutations in S n with r descents. Similarly, from the formula for Ck, n +i 
one sees that for any r, the probability of r + 1 cyclic descents under the 

( n-\-k — r— 1\ 

measure Ck^n+i is ( n+ ")fcn multiplied by the number of permutations in 
S n +i with r + 1 cyclic descents. The result follows from Lemma 12.51 □ 

Corollary 3.7. Let A = ^tj where k,n are positive integers. Then for any 
A C Z+, 

|P^ n (/c-l-d(vr)GA)-P A (A)| 
< (_^)2 + J^ + A;(n + 2)(1 _I r +i. 

Proof. This is immediate Corollary 13.51 and Proposition 13.61 □ 

4. Other regimes 

This section is organized into three subsections. Subsection 14 . 1 1 gives good 
asymptotic results for the mean and variance of c(n) under C^^ n (and so also 
for d(ir) under Dk >n ). Subsection 14.21 develops further properties of the ex- 
changeable pair d, d! under Ck t n which are relevant to normal approximation. 
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Subsection l4.HI gives a speedy algorithm for sampling from the measures Ck, n 
and R ktTl . 

4.1. Asymptotics of mean and variance. This subsection derives sharp 
estimates for the mean and variance of c(ir) under the measure Ck,n when 
n — SF" Since by Proposition 13.61 the distribution of d(ir) under Rk, n is the 
same as the distribution of c(tt) — 1 under C^n+i, one immediately obtains 
results (which we stated in the introduction) for the mean and variance of 
d(7r) under Rk, n - We also remark that Corollaries 12.21 and 12.41 imply results 
for d(ir) under Ck, n - 

Throughout we will use information about the Bernoulli numbers B n . 
They are defined by the generating function f{z) = Yln>o B ™n\ = e^-L so 
that .Bo = l,Bx = -\,B 2 = \,B$ = 0,B 4 = and Bi = if i > 3 is 
odd. The zero at in the denominator of f(z) cancels with the zero of z, so 
f(z) is analytic for \z\ < 2tt but has first order poles at z = ±2iri, ±4iri, 
We also use the notation that (n)t denotes n(n — 1) • • • (n — t + 1) and that 
(n) = 1. 

To see the connection with Bernoulli numbers, Lemma 14. II shows how to 
write Ec(c) in terms of them. 

Lemma 4.1. 

t=l 

Proof. This follows from Proposition 12.71 and by the expansion of partial 
power sums in |GRj : 

□ 

Lemmas 14.21 and 14.31 give two elementary estimates. 
Lemma 4.2. For < t < n, 

ii 



1 _ W _ © 



< ©' 



2n 2 ' 

Proof. For i = 0, 1 the result is clear, so suppose that t > 2. We show that 

(!) _ (2! < ! _ (2)* < 0). 

n 2n 2 n* n 

To see this write 1-^ = 1- (1- ±)(1 - |) • • • (1 - One proves by 

induction that if < xi, ■ ■ ■ , x n < 1, then IX/ (1 ~~ x i) — 1 ~~ Yl x r Thus 

(1 ——)■•■ (1 — ^— — ^) > 1 — — = 1 — — 
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which proves the upper bound. For the lower bound note that 
(! 1 ) n t ~ l ) = e tofl(l-l/n)™(l-(t-l)/n) 



n n 



1 1 j 



= e " 

< i _ I 2 ! t (2) 

— n 2n 2 

The last inequality is on page 103 of |HLPj . □ 
Lemma 4.3. Suppose that a > Then for n > 1, 

^\B t \t l < C ha n l 



^ aHl ~ (2vra) n 

t=n v ' 

where C^ a is a constant depending on I and a ( and if a > 1 the constant 
depends only on I). 

Proof. Recall that B t vanishes for t > 3 odd and that there is a bound 
\B2t\ < 8y / vri(^) 2t for t > 1 jLij. Combining this with Stirling's bound 

one concludes that 

f-^ aHl ~ f^> v ' (2vra) n ^ (27ra)* ~ (27ra) n ^ (2vra)* 

where C is a universal constant. The ratio test shows that X^So (2^0)' 
converges for 2ira > 1 (and moreover is at most a constant depending on / 
ifa>l). □ 

We also require a result about Bernoulli numbers. 

Lemma 4.4. Suppose that a > Then 

f-\\ v^oo Bt _ 1 

y 1 ) 2^t=0 t\of ~ a ( e V«-l) ' 

/ 9 \ gtQ) _ e 1 / a (-2ae 1 / Q +2a+e 1 /°+l) 

W l^t=0 t\a? ~ 2a s (e 1 / a -l) 3 ' 

fn\ V^oo B t +i _ ge 1 / a -e 1 / a -a 
Z^t=0 t\of ~ a (e 1 /«-l)2 • 

/ ,\ v^°° j^+jCa) _ e 1/a (3ae 2 / Q -e 2 /" -4e 1 / Q -3a-l) 
W 2^=0 tk? ~~ 2Q 3( e l/a_ 1 )4 • 

Proof. We use the generating function /(z) = ^2 t>0 ^jf~ = T^zi for the 
Bernoulli numbers, which as mentioned earlier is analytic for \z\ < 2tt. For 
the first assertion simply set z = 1/a. For the second assertion, one com- 
putes ^4^4; f{ z ) and evaluates it at z = 1/a. For the third equation one 
differentiates f(z) with respect to z and then sets z = — . For the fourth 
equation one differentiates f{z) three times with respect to z, then multiplies 
by iFj- and sets z = ^. □ 
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Next we give an estimate for Ec(c). 
Proposition 4.5. Suppose that k = an with a > Jr. T/ien 



Er 



1 



c (c(7r)) - n I « - el/a _ i 



'e 1 / a (-2ae 1 / Q + 2a + e 1 / + 1) 



< 



a, 



n 



^ 2a 2 (eV« - 1)3 

where C a is a constant depending on a ( and which is independent of a for 
a > I). 

Proof. By Lemma 14. 1| 

n-1 



E c (c) 



t=0 



B t {n) 
tW 



k -'Eir+*E- s ' ( "'" w ' 1 



t=0 



t=0 



tljfc* 



n-1 



an — an 



BAl - 



t=0 



t=0 



From this and Lemma 14.21 it follows that 



E C (c)-(an-an±^ + aj: B ^ 



t=o t=o 



tla* 



\B t \ . a ^-A \B t \\ 2 ) 

< any -r-r H > — r^r 

^ t\a l In ^ t!a* 

t=n t=0 



Thus 



Ec(c) - [an - an^jAt + a E ' ' " 



t=o 



is at most the "error term" 



an 



Bt\Q , a 



t=o 



n-1 



B, 



t\\2) 



2n f-' t!a* 



t=n t=n 

From Lemma 14.31 the error term is at most — where C a is a constant 
depending on a (and which is independent of a for a > 1). The result now 
follows from parts 1 and 2 of Lemma 14.41 □ 

Proposition 14.61 estimates the variance of c(ir). The bounds are signifi- 
cantly better than those in the literature. 

Proposition 4.6. Suppose that k = an with a > J-. Th 



en 



Varc(c(ir)) — n 



'eV«( a 2 e 2/a + a 2 _ 2a 2 e l/a _ e l/a> 



ft 



2( e l/a _ 1)4 



< ^4r 



where A a is a constant depending on a ( and which is independent of a for 
a > 1). 
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Proof. From Proposition 12.81 and the expansion of partial power sums in 

KM- 

V r n _ Q " („, y Bt+i (n + l)t+i \ 
^ - n + l[ a + 2s (t + l)!o* ) 

r=0 \ t=0 y ' / 

it follows that 

E c (c 2 ) 

2 n(n + 1) ^4 . n n(nk — n — k) ^-4 , n _ 1 

i=i " 3=i 

X ' 1 B t+ i(n + l)t+i\ 



fc 2 -nfc u + r , ;\„; 



+(nfc — n — A;) j A; + 



2 5 t+ i(n)< + i 



(t + l)!jfe* 



t=o 

n— 1 / , -, \ n— 2 



-nk - nk V [(n + %= (n)f+l] - (n + fc) V ^ +1 ^ 
(nk — n — k)B n 



k n-l 
This simplifies to 



-nk - nk Y - (nk + fc 2 ) V 
(n/c — n — k)B n 

fc 2 - nfc V - (nib + A: 2 ) V ^ _ (nk - n - k)B n 

t=o t=o 



n-l R n R 

2 2 2 \ ^ / 2 , 2 2\ \ n t 

n — an > — — — an + a n ) > — — , 

t=o t=o 



+an E + («™ + « 2 " ) E — 

t=o l - a t=o l - a 

(an 2 — n — an)B n 
(an)™" 1 ' 
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Lemma 14.21 implies that the absolute value of the difference between 
E c (c 2 ) and 



OO „ 00 R 



2 2 2 \ ~* -"t+l / 2 , 2 2\ \ " D t 

an — an > — — — (an + a n ) > — — - 

t=0 t=0 



n-1 

-an 



„ , + an + a 2 n > —ff- 

t=o t=o 



is at most 



an* 



OO In 00 I R I ^-1 I R I fi\ 2 

t=n t=n+l t=0 



(a + a 2 ) \B t \ (2) |an 2 — n — an||S n | 
H 9 2_> Tut h 



2 ^ da* (an)"" 1 



Thus the difference between Ec(c 2 ) and 

00 R 00 R 
2 2 2 \ ~* / 2 1 2 2\ \ " 

an— an > — — — an + a n ) > — — - 

^ t\a l y ' ^ t a* 

t=o t=o 

+an > — —j^- + (an + a z n) > , v f 



t=o t=o 



is upper bounded by the "error term" 



an 



00 

T ^T— ^ + (an 2 + aV) £ — 

t=n t=n+l 

a ^-i |-Bt+i| Q) (o + a 2 ) ^> |-Bf| (3) |an 2 — n — an||-B„ 

+ 9" 2^ Tut 1 9 2^ ~ r 



2 ^ tla* 2 ^ t!a* ' (an)"" 1 

Next it is necessary to bound the five summands in the error term. Lemma 
14.31 shows that the first four summands in the error term are at most a 
constant depending on a (or a universal constant if a > 1) . Since Bt vanishes 
for t > 3 odd, \B 2 t\ < 8y / 7rt(^) 2 * [Le], and 27ra > 1, the fifth summand in 
the error term goes to much faster than a universal constant. 

The result now follows by combining the above observations with Lemma 
14.41 and Proposition 14.51 □ 

4.2. Further properties of the exchangeable pair. This subsection 
develops further properties of the exchangeable pair (d, d') from Section 
01 Since we are interested in central limit theorems, it is natural to in- 
stead study (W,W) where W = d ~ Ec(d) and W = d '~ Ec(d) . Note 

that from Lemma l2.1l one knows Ec(W — W|7r). In what follows we study 



18 

Ec(W — W|W), which is typically used in normal approximation by Stein's 
method. 

Proposition 4.7. 

E C (W - W\d = r) 

W 1 / P c (c = r + l)(r + l)(n-l) \ 

n n^Var c (d) V = r) n C{ ') ' 

Proof. Since (i is a function of 7T, 

dPc(W'- W = a,d = r) 



E C {W - W\d = r) = 



E E 



P c (<2 = r) 

oPcCW- W = 0,7T) 



i 7r:d(7r)=r 



^ P c (vr) aP c (W" - W = a,7r) 



p c ( d = r )Z- Pc(7r 

7r:a(7r)=r « 
n:a(n) — r 



By Lemma 12. II this is equal to 

v^^U«- Pc(d = r) 



r 7i—l 



n n 



n^/V r arc(d) ny/Varc(d) 
W 1 / (n-l)E, :dW=r Fc(vr)I XnW=1 ^ 

n + n v / F^(d)l v P C (d = r) C[ ' y 

W 1 / P c (c = r + l)(r + l)(n-l) 

n n^/Varc(d) V Pc(<* = r) n ° U 



□ 



In most examples of Stein's method for normal approximation of a random 
variable W, there is an exchangeable pair (W, W') such that E(W' | W) = (1— 
A)W. There are two recent papers f |RR| . |Uh"j ) in which the Stein technique 
has been extended to handle the case where E(W'| = (1 — \)W + G(W) 
where G(W) is small. The bounds in these papers require that Sl^ZlD 
goes to 0. Proposition 14.91 uses interesting properties (Lemma 14.8(1 and 
asymptotics of Eulerian numbers to prove that for our exchangeable pair 
E (|Q(.W)I) j g bounded away from 0, even for (the uniform distribution 

on the symmetric group), where we know that a central limit theorem holds. 
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Lemma 4.8. Let A n ^ denote the number of permutations on n symbols with 
k — 1 descents. 

(1) If n is odd and < r < n — 1, then (r + l)j4 n _i )r+ i > (n — r) A„_i iT . 
if and only ifO<r< . 

(2) If n is even andO < r < n — 1, then (r + l)^4 n _i /r+ i > (n — r)A n ~i r 
if and only i/0<r<5 — 1. 

Proof. Suppose first that n is odd. For the if part, we proceed by reverse 
induction on r; thus the base case is r = , and then the inequality is an 
equality since A n _i : k = J 4 n _ l n _/ C for all k. A result of Frobenius [FY] is that 
the polynomial J2k>i z ^n-i,k nas on ly rea l roots. Thus an inequality of 
Newton (page 52 of |HLPj ) implies that 

(r + l)A n _i |T . + i ^4n-i,r+2 (r + 2) (re - r - 1) 
(n - r)A n ^ 1>r ~ ^„_i >r+ i (re - r - 2)(n - r) ' 

By the induction hypothesis the right hand side is at least ( n ^~^(n-r) ^ 1" 
The only if part follows from the if part since A n _i ^ = A n _\^ n _-y. for all k. 

The case of n even is similar. For the if part, we proceed by reverse 
induction on r. The induction step is the same but base case r = ^ — 1 is 
not automatic. However it follows using Newton's inequality of the previous 
paragraph together with the symmetry property A n _\ ^ = A n _ 1)n _&: 

(- + I) 



_rs -hi 

{A n _i^_i) 



,2 + 



■ 2 

> (^_ ljf _ 1 ) 2 (l + -) 2 . 

2 n 

Now take square roots. The only if part follows from the if part since 
A n - 1)k = A n - 1>n - k for all k. □ 

Proposition 4.9. Let U n denote the uniform distribution U n on S n . Let 
(W, W) be the exchangeable pair of this subsection, so that by Proposition 
\l% E(W'\W) = (1-\)W+G(W) with A = i. Then E ^(|iW)l) is bounded 
away from as n — > oo. 

Proof. It is elementary that E[/ n (d) = Thus Proposition 14.71 implies 

that 

Ku n (\G(W)\) = , % ' m E ( r + 1) P ^( C = r + 1) - ^ Un (d = r 

Using the equality 

Vu n {d = r) = £±}±V Un {c = r + 1) + ^— % Un (c 
n n 
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this simplifies to 



^ |(r + l )^u n {c = r + 1) - (n - r)F Un (c = r)| 



2n^Var Un (d) 
By Lemma 12.51 this further simplifies to 

n-l 



I (r + l)A n _ 1:r+1 - (n - r)A n _ 1>r | , 



(n-l)!2nV^^(d)r=o 
where ^4 n) fc denotes the number of permutations in S n with A; — 1 descents. 

Now suppose that n is odd. Then the previous paragraph, Lemma T4.81 
and the symmetry A n _\^ = A n _\^ n _^ imply that E[/ n (|G(W)|) is equal to 



n — 3 
2 



;/ — ] ./' J 



(n-l)!nV^ n (d) r r;) 
(n-l) 



(n-l)\n 2 y/Var Un (d) 

(n — 1) / n + 1 



^ ((r + l)A n _ ljr+1 - (n - r)A 
=o 

n— 1 n — 3 

2 2 

^ rA n _i >r - y~^(n- r)Ai-i,' 

r=l r=l 

n-l 
2 



r=l 



A similar argument for n even shows that 

(n-l) ( nA n _x 



n 2 ^JVar Un {d) \2(n- 1)! V 
To conclude the argument, note that for n > 2, Varu n (d) = ^^fc-, and 



.4 



that "g^gf 1 is asymptotic to yf ^ [CKSSj. P- Thus nE l/n (|G(W)|) is 
bounded away from 0, as desired. □ 

4.3. A Sampling Algorithm. To conclude the paper, we record a fast 
algorithm for drawing samples from the measure Rk, n - Since Ck, n can be 
obtained by sampling from Rk,n and then performing uniformly at random 
one of the n cyclic rotations of the bottom row in the two line form of 
a permutation, we only give an algorithm for sampling from Rk, n - This 
algorithm should be quite useful for empirically studying the transition from 
Poisson to normal behavior. 

We use the terminology that if it is a permutation on n symbols, the 
permutation r on n + 1 symbols obtained by inserting n + 1 after position 
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j (with < j < n) is defined by r(i) = tt{i) for 1 < i < j, r(j + 1) = n + 1, 
and t(i) = ir{i — 1) for j + 2<i<n+l. For instance inserting 5 after 
position 2 in the permutation 3 4 12 gives the permutation 3 4 5 1 2. 

Proposition 4.10. Starting with the identity permutation in Si, transition 
from an element tt of S n to an element r of 5 n +i by inserting n + 1 as 
described by the following 2 cases: 

(1) If either j = n or 7r(j) > 7r(j + 1) and 1 < j < n — 1, the chance of 
inserting n + 1 after j is ^fe?jy^ • 

(2) If either j = or ir(j) < 7r(j + 1) and 1 < j < n — 1, the chance of 
inserting n + 1 after j is ^^^T 1 . 

Then after running the algorithm for n — 1 steps, the distribution obtained 
on S n is precisely Rk, n - 

Proof. First note that the transition probabilities of the algorithm sum to 
1, since 

/ ,/ \ „s ( n + k — d(ir) \ , ,. . . / k — dii:) — l\ 

( ' ^< ' r, + :) ( ) + <" - { t( »Vi) j 

Now observe that if r is obtained from 7r by a Case 1 move, then d(r) = 
dhr). Thus from the formula for Rk n , R ^ n +j^ = n V?~f^ . Similarly 
if r is obtained from ir by a Case 2 move, then d(r) = cZ(7r) + 1. Thus 

-Rfc,n + l(T) _ fc — <i(7r) — 1 |— i 

Rk,n(n) ~ k(n+l) 
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